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ABSTRACT 


This paper presents findings on the implementation and impacts of Leveled Literacy 
Intervention (LLI) in Oakland, California, where the school district conducted the nation’s first 
randomized controlled trial of LLI in secondary grades. LLI is a short-term, intensive 
intervention designed to help teachers provide small-group instruction to struggling readers. 
Many school districts across the country have used LLI, which research evidence has shown to 
rapidly improve outcomes for students in early elementary grades. During the trial, secondary 
schools in Oakland faced various challenges implementing LLI, leading students to experience 
different levels of LLI duration, intensity, and fidelity. LLI had no impact on students’ reading 
comprehension and a negative impact on their mastery of English language arts/literacy 
standards. Students who were pulled out of other classes to receive LLI were particularly 
negatively affected, possibly as a result of missing grade-level content. This study’s findings 
highlight challenges in implementing effective literacy interventions for struggling adolescent 
readers. 


The research reported here was supported by the Institute of Education Sciences, U.S. 
Department of Education, through Grant R305L160003 to the Oakland Unified School District. 
The opinions expressed are those of the authors and do not represent views of the Institute or the 
U.S. Department of Education. The authors gratefully acknowledge the district and school staff 
who supported this study, including Jean Wing, Nancy Lai, Abbey Kerins, and Rinat Fried. We 
also thank Brian Gill (Mathematica Policy Research) and Eric Isenberg (American Institutes for 
Research) for their contributions to this study. Finally, we are grateful to the grant’s project 
officer, Allen Ruby, for providing helpful suggestions throughout the project. 
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I. INTRODUCTION 


Too many American youth leave high school without the literacy skills that colleges and 
employers demand. In 2015, 28 percent of 12th graders in the United States performed below the 
basic level on the National Assessment of Educational Progress, meaning that those students had 
less than partial mastery of the knowledge and skills that are fundamental for reading at their 
grade level (U.S. Department of Education 2018). Because literacy is associated with greater job 
opportunities and higher incomes, the stakes are high for finding effective ways to accelerate 
these students. Adults with below-basic literacy are 16.5 times more likely to receive public 
assistance and 5 times more likely to earn less than $300 per week relative to those with the 
highest level of literacy (Wood 2010). 


Secondary schools often struggle with finding effective ways for quickly accelerating the 
progress of older, struggling readers. The challenge is apparent in literacy trends over the last 
decade: the average reading performance of 12th graders on the National Assessment of 
Educational Progress has been stagnant, as has the achievement gap between low-income 
students and those not eligible for free or reduced-price lunch (U.S. Department of Education 
2018). As summarized by Levin et al. (2010), “Progress in strengthening young people’s literacy 
now depends on schools a) choosing appropriate programs and b) implementing them 
consistently and effectively.” 


An ideal setting in which to study reading intervention for secondary students is the Oakland 
Unified School District (OUSD), where 52 percent of students in grades 6 through 12 scored 
multiple years below their expected reading level in 2016 on the Scholastic Reading Inventory 
(SRI) and almost one-third scored at least four years below grade level. As in other school 
districts around the country, Oakland students who are multiple years behind in reading also face 
other challenges. In OUSD, 33 percent of such students are English learners, 21 percent are in 
special education, and 88 percent are eligible for free or reduced-price lunch. 


To help these adolescent readers improve their literacy skills, OUSD invested in piloting the 
Fountas & Pinnell Leveled Literacy Intervention (LLI), an intensive reading intervention 
program. After one year of piloting the program in a small number of secondary schools, the 
district partnered with researchers to conduct the nation’s first randomized controlled trial of LLI 
in secondary grades. Many school districts across the country have used LLI, often as part of a 
Response-to-Intervention model, and the program has shown promise in rapidly improving 
outcomes for students in early elementary grades (What Works Clearinghouse 2017). This paper 
presents findings on the implementation and impacts of LLI in secondary schools. 


Il. OVERVIEW OF LLI 


LLI is a short-term, intensive intervention system designed to help teachers provide daily, 
small-group instruction to students who are not achieving grade-level expectations in reading. It 
is intended to supplement, rather than replace, regular literacy instruction, and draws on research 
on reading comprehension and skill acquisition, vocabulary development, oral fluency, and 
student engagement and motivation (see Heinemann 2015 for a summary of the research base for 
LLI for grades 3 through 12). LLI was developed by Irene C. Fountas and Gay Su Pinnell and is 
published by Heinemann. 


WORKING PAPER 62 MATHEMATICA POLICY RESEARCH 


Although it was originally developed for students in early elementary grades, LLI has since 
expanded across K-12. Materials are bundled in kits for different reading levels and include a 
series of fiction and nonfiction texts and sequential lesson guides of progressing difficulty.' Odd- 
numbered lessons focus on discussing and revisiting the book from the previous lesson, 
phonics/word work, and reading a new book that is in students’ instructional reading level. Even- 
numbered lessons focus on revisiting the book from the previous day, conducting a reading 
assessment for progress monitoring, writing about the book from the previous day, phonics/word 
study, and introducing a new book that is at students’ independent reading level. Each LLI lesson 
guide also provides suggestions for supporting English learner students. 


To implement LLI, teachers begin by assessing students with the Fountas & Pinnell 
Benchmark Assessment System (F&P BAS), a one-on-one assessment that matches students’ 
instructional and independent reading abilities to the text-level gradient used by LLI. Teachers 
then form small groups of three to five students with similar assessment scores and deliver 45- 
minute daily lessons. Lessons may be adapted to 30 minutes by using guidance provided by the 
program. The recommended program length ranges from 12 weeks to 24 or more weeks and 
depends on students’ starting reading level and progress. For the starting reading levels of most 
students in this study (corresponding to grades 3 through 5), LLI recommends that students 
participate for 18 to 24 weeks. 


Before implementing the program, LLI teachers may participate in training delivered by 
Heinemann coaches. Heinemann offers various training opportunities, including three-day, on- 
site trainings that cost approximately $9,600 for up to 30 teachers. In addition to the training and 
the kits (which range in cost from $2,900 to $4,950), an important input to LLI is teacher time, 
particularly given the small student-to-teacher ratio required by the intervention. In OUSD, 
teachers were typically assigned to teach LLI half time and served two LLI groups, or about 10 
students. Based on the national average teacher salary, the cost of a half-time teacher is 
approximately $36,800 per school year.” Other implementation costs include support from 
principals and district leadership and classroom facilities. 


LLI is based on a theory of change positing that the progress made by struggling readers 
toward the goal of reading at grade level is affected by factors related to LLI and contextual 
factors (Ransford-Kaldon et al. 2013). For instance, the program requires that teachers are 
qualified and trained to deliver the lessons, that lessons are taught with fidelity, and that the 
duration of LLI meets the needs of students to produce results. Important contextual factors that 
influence the program’s success include school-level supports for literacy instruction, the quality 
and continuity of regular classroom instruction, and the support that students receive at home. 


' There are seven LLI kits, each one denoted by its own color. As of June 2018, the LLI Orange System 
(kindergarten) cost $2,900. The Green System (grade 1) cost $3,416, the Blue System (grade 2) cost $3,324, and the 
Red (grade 3), Gold (grade 4), Purple (grade 5), and Teal systems (grade 6 through 12) cost $4,950. The teachers in 
this study used the Blue, Red, Gold, Purple, and Teal systems. 


4 According to the U.S. Department of Education (2016), the average teacher salary in the United States in the 
2016-2017 school year was $58,950. We applied a 25 percent estimated fringe benefit to this figure, following 
Levin et al. (2010). 
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lil. PREVIOUS LITERATURE ON LLI 


Past studies indicate that LLI is effective at improving the reading skills of younger students. 
In three randomized controlled trials in diverse settings with students in kindergarten through 
grade 2, LLI produced significant gains in reading after 12 to 18 weeks, with effect sizes ranging 
between 0.39 and 0.81 standard deviations across grade levels (Ransford-Kaldon et al. 2010; 
Ransford-Kaldon et al. 2013). Other non-experimental studies have also found promising 
evidence of LLI’s effectiveness among students in early grades (e.g., Taylor 2017; Odell 2012; 
Peterman et al. 2009; and Harrison et al. 2008). A review of the research on LLI by the What 
Works Clearinghouse (2017) determined that LLI had positive effects on general reading 
achievement, potentially positive effects on reading fluency, and no discernible effects on 
alphabetics for students in kindergarten through grade 2. 


In the two studies that met the What Works Clearinghouse’s review standards, LLI was 
implemented with a high degree of fidelity (Ransford-Kaldon et al. 2010; Ransford-Kaldon et al. 
2013). Classroom observers in both studies determined that lessons were delivered as designed 
more than 95 percent of the time. In addition, teachers participated in eight days of training in 
how to implement the program and received additional professional development throughout the 
school year. In both studies, 91 percent of teachers felt that they had received adequate 
professional development for implementing LLI. Researchers also found that regular classroom 
instruction followed many of the same principles of LLI. 


At the same time, however, the studies highlighted some implementation challenges. 
Students received fewer than the recommended number of instructional sessions, although they 
still made significant gains. Several classroom observers and teachers noted that lessons were 
fast-paced and could not be completed adequately as designed within the suggested time frame. 
Staff also observed that the reading assessment activity in the even-numbered lessons took up too 
much time. Finally, when asked about logistical issues, most LLI teachers mentioned time and/or 
scheduling, particularly coordinating with classroom teachers to pull out students for LLI. 


IV. METHODS 


To determine the impact of LLI on secondary students’ reading achievement, this evaluation 
used an experimental design that randomly assigned groups of students within a school to either 
a treatment group, which received LLL, or a control group, which proceeded with business as 
usual. In this section, we describe the study’s school and student recruitment, random 
assignment, data collection, and analysis methods. 


School and student recruitment 


In fall 2016, OUSD recruited 10 secondary schools that planned to implement LLI during 
the 2016-2017 academic year. The district offered schools a stipend of $110 per LLI student as 
an incentive to participate in the study. Of the more than 20 secondary schools that were 
contacted, 7 middle schools and 3 high schools agreed to participate in the study. 


Because the schools in the study were not randomly sampled, they are not necessarily 
representative of all secondary schools in OUSD or other urban school districts. Schools that 
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were interested in participating had to be prepared to implement LLI, and they had to be willing 
to allow students to be randomly assigned to the intervention. On average, participating schools 
had a high share of students who were eligible for free or reduced-price lunch (87 percent), were 
English learners (31 percent), and scored three or more years below grade level on the SRI in fall 
2016 (50 percent). 


Within the study schools, teachers selected students in grades 6 through 9 to be included in 
random assignment based on the same criteria they would have used to select students for LLI in 
the absence of the study. A student’s eligibility for the study was largely based on whether the 
student’s starting reading level on the F&P BAS test was multiple years below grade level and 
resembled the reading level of other students who might have been eligible for the study, so that 
homogeneous instructional groups could be formed as specified by LLI. 


Random assignment 


In fall 2016, teachers used students’ F&P BAS scores (and, if relevant, any scheduling 
constraints) to form groups of three to seven students with similar starting reading levels. The 
study team then randomly assigned the groups to treatment and control groups separately for 
each study school. Group-level random assignment allowed teachers to maintain control over 
forming the instructional groups—an important component of LLI. 


To improve the precision of the study’s impact estimates and reduce the possibility of a 
chance imbalance between the treatment and control groups, we paired, when possible, each 
group into strata (matched pairs) before random assignment on the basis of similar median pre- 
intervention F&P BAS scores. In some cases, groups had to be paired if they shared a scheduling 
constraint. Within each stratum, we randomly assigned one group to the treatment group and the 
other to the control group. * Across all 10 study schools, 292 students in 76 groups were 
randomly assigned, with 145 students assigned to the treatment group and 147 assigned to the 
control group. 


As expected, students selected for the study were low-performing students relative to the 
rest of the district. For example, their average z-score on the fall 2016 SRI was -0.54 standard 
deviations. Stated differently, they scored 3.8 years below their expected grade level on average. 
Students in higher grade levels tended to be further behind—on average, students in grade 6 
scored 3.0 years below grade level on the SRI, while students in grade 9 scored 5.1 years below 
grade level. Students selected for the study also had low performance on the spring 2016 Smarter 
Balanced Assessment Consortium (SBAC) ELA/literacy test, with only 3.6 percent meeting 
standards. 


In Table 1, we present baseline test scores and demographic characteristics of students by 
treatment status. By chance, students randomly assigned to the treatment group had slightly 


3 One school submitted 22 students for random assignment without groupings or F&P BAS scores. We paired 
students by using their baseline SRI score and assigned one student in each pair to treatment. The teacher used the 
students’ F&P BAS scores to form LLI groups from the 11 students assigned to treatment. In general, teachers could 
move treatment students into different group configurations after random assignment. Students in groups assigned to 
the control condition did not remain in those group configurations, as they did not receive LLI. For these reasons, 
the randomly assigned groups did not necessarily reflect instructional groups. 
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lower baseline academic performance than that of control students, although only one 
difference—on the fall 2016 Scholastic Math Inventory (SMI)—‘is statistically significant at the 
5 percent level. Overall, students in both the treatment and control groups had similar 
demographic characteristics. The differences in baseline measures are not jointly statistically 
significant. 


Table 1. Baseline student characteristics 


Standardized 


Baseline characteristic Treatment Control difference 
Mean (of res Mean (of ane (percent) 
Fall 2016 F&P BAS (numeric score, 1-26) ee 134 pen 136 23.8 
Fall 2016 SRI (z-score) nea 143 ees 146 -18.7 
Fall 2016 SMI (z-score) neers 113 een 123 27.4" 
Spring 2016 SBAC ELA/literacy (z-score) nes 126 nee 123 -21.5 
Spring 2016 SBAC math (z-score) ren 126 nr 125 10.6 
African American (%) 40.0 145 35.4 144 9.4 
Asian/Filipino/Pacific Islander (%) 6.21 145 9.72 144 -13.0 
Latino (%) 48.3 145 50.0 144 -3.4 
Male (%) 51.7 145 53.4 146 -3.4 
Eligible for free/reduced-price lunch (%) 92.4 145 92.5 147 -0.2 
English learner (%) 34.5 145 26.7 146 16.9 
Special education (%) 11.7 145 10.3 147 4.6 


Source: Authors’ calculations using data provided by OUSD. 


Notes: Standard deviations are displayed in parentheses. F&P letter scores were converted to numeric scores 
(i.e., 17 corresponds to the letter Q). SRI, SMI, and SBAC scores were converted to z-scores defined 
relative to OUSD’s distribution of scores by test, administration, and grade. 


* Difference is statistically significant at the 5 percent level. 


Some students who were randomly assigned did not take the relevant outcome assessments 
in spring 2017 (the SRI and the SBAC ELA/literacy test).4 Overall attrition was approximately 
10 percent for the SRI and 6 percent for the SBAC ELA/literacy test. The differences in attrition 
rates between the treatment and control groups are small and not statistically significant (Table 
2). After attrition, the treatment and control groups in each of the two analytic samples 
demonstrate similar patterns in baseline charactristics as the randomized sample and remain 
balanced overall (Appendix Tables A.1 and A.2). 


4 The SBAC is offered to students only in grades 6 through 8, so the analysis of ELA/literacy scores is based on 
those grades while the analysis of SRI scores is based on the larger sample of students in grades 6 through 9. 
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Table 2. Sample sizes and attrition 


Overall Differential 
attrition attrition 
Treatment Conirol Total (percent) (percent) 
Randomly assigned 145 147 292 
In SRI analysis 128 135 263 9.9 3.6 
In SBAC ELA/literacy analysis 95 97 192 5.9 3.9 


Source: Authors’ calculations using data provided by OUSD. 
Data collection 


The analyze impacts, we used student achievement and demographic data collected by 
OUSD in the 2015-2016 and 2016-2017 school years. As mentioned above, we studied student 
achievement on the SRI and the SBAC ELA/literacy assessments. The district administered the 
SRI to students in all grades in fall and spring of each school year. Using written materials 
sampled from various content areas, the SRI measures how well students can read and 
comprehend literary and expository texts. In spring of each school year, students in grades 6 
through 8 also took the SBAC, which tests mastery of grade-level standards in ELA/literacy and 
mathematics. Both assessments are aligned to Common Core State Standards and are computer- 
adaptive.” 


The use of reading comprehension and general literacy achievement to assess the impact of 
LLI in this study is motivated by earlier research on adolescent literacy. Comprehension is seen 
as a primary challenge among struggling adolescent readers (Kamil et al. 2008). General literacy 
achievement, which on the SBAC includes reading comprehension, writing, listening, and 
research/inquiry components, has been linked to labor market outcomes later in life (Wood 
2010). 


To study implementation, we used LLI attendance logs and other implementation data 
provided by OUSD. LLI teachers tracked student attendance in every session, start and exit 
dates, crossover, and any literacy supports provided to students in the control group. The study’s 
school liaison observed each LLI teacher once between December 2016 and March 2017 and 
rated the fidelity of each lesson component on a scale of 0 (not observed) to 3 (high fidelity) by 
using a version of a fidelity rubric developed by Heinemann. The school liaison also collected 
information about each teacher’s implementation of LLI, including the setting (e.g., pullout 
versus in-class), the number of days per week that LLI was offered, and session length. 


Data analysis 


This study aimed to learn about the implementation of LLI in secondary schools and 
evaluate LLI’s impact on student literacy outcomes. For the implementation analysis, we 
conducted descriptive quantitative and qualitative analyses of the implementation data collected. 
Below, we describe in greater detail the methods used for the impact analysis. 


> The SBAC ELA/literacy test also contains a performance task component. 
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Intent-to-treat analysis 


In the random assignment design used in this study, the simple difference between the 
outcomes of treatment and control students is an unbiased estimate of the impact of LLI. 
However, to improve the statistical precision of the estimates, we measured the treatment-control 
difference after controlling statistically for small, random differences in the baseline achievement 
and characteristics of students. Specifically, we controlled for baseline achievement on the SRI 
and SBAC ELA/literacy and math tests from spring 2016 and the SRI and SMI from fall 2016, 
all administered before the beginning of treatment. In addition, we controlled for the following 
student characteristics: gender, race/ethnicity, eligibility for free or reduced-price lunch, English 
learner status, special education status, and grade level. 


Accordingly, we estimated student impacts using the following model: 
(1) Vijn = Ue + BT je + Xijny + UjK + eijx 


where yix 1s the spring 2017 SRI or SBAC ELA/Jliteracy score of student i in group j within 
stratum k; ox is a vector of stratum (matched pair) fixed effects; Tjx is a treatment indicator that 
equals | if the group was assigned to receive LLI and 0 otherwise; Xijx is a vector of baseline 
student test scores and demographic characteristics; ujx is a group-specific random error term; 
and éjx is an individual-level random error term. y is a vector of parameters to be estimated, and £ 
is the key parameter measuring the impact of assignment to LLL, or the intent-to-treat (ITT) 
impact, on student achievement. We estimated the model with ordinary least squares using 
standard errors that account for group-level clustering. 


Treatment-on-the-treated analysis 


We also conducted a treatment-on-the-treated (TOT) analysis, which estimated the impact of 
receiving LLI, accounting for the fact that some students randomly assigned to LLI did not 
actually receive it and that one student randomly assigned to the control group nonetheless 
received LLI. To estimate TOT impacts, we used students’ group assignment status Tj, as an 
instrumental variable for the receipt of LLI in a two-stage least squares regression. We defined 
whether a student received LLI based on whether they attended at least four sessions (or one 
week’s worth).° Random assignment status is a valid instrument because it strongly predicts 
receipt of LLI and affects student outcomes only through receipt of LLI. 


The first stage regression (equation 2) estimates the effect of random assignment to LLI on 
the probability of receiving LLI, and the second stage (equation 3) estimates the impact of 
receiving LLI on outcomes, given by the coefficient 6. 


(2) Received LLI; jx, = 1 + ATjx + Wjx + Vijx 


© Under this definition, 5 students in the treatment group who attended just one session are not considered to have 
received LLI. We also considered an alternative definition of receipt based on whether students attended the 
minimum number of lessons recommended by LLI. However, random assignment status may not be a valid 
instrument in this case, as it also predicts whether students received fewer than the recommended number of 
sessions, which could also affect reading outcomes. 
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(3) Vijk = A, + dRecewedLLI; jx + XijnY + Ujk + Eijk: 


Subgroup analysis 


The estimation of overall impacts may mask differences in impacts by factors such as 
student characteristics or fidelity of implementation. As an exploratory analysis, we tested 
whether the impacts of LLI varied on key dimensions, including whether (1) students were 
English learners; (2) students were three or more years behind grade level on the fall 2016 SRI; 
(3) LLI was taught in pullout groups rather than in scheduled classes; (4) teachers had any 
experience with LLI; and (5) LLI was taught with high fidelity.’ These analyses are exploratory 
because of limited sample sizes within subgroups and the fact that types of LLI classes or 
teachers were not randomly assigned to students. 


For each subgroup of interest, we estimated the following regression model, which adds an 
interaction term to the benchmark model in equation (1): 


(4) Vije = Oe + BiTjx + BoC jx * Wijn) + Xijey + Ux + Eije- 


W represents the relevant variable for the hypothesis being tested (for example, an indicator for 
whether the student was an English learner). Given that W is always a binary indicator, the 
coefficient on the interaction term /2 represents how the impact for members of that subgroup 
differs from the impact for others, captured by /7. 


Missing baseline data 


Data on one or more of the variables used as baseline controls were missing for about 27 
percent of students. As described above, we included five baseline assessments as controls to 
improve the statistical precision of the estimates. Most commonly, students were missing scores 
on the spring 2016 SRI (21 percent) and fall 2016 SMI (19 percent). To address the missing data 
problem, we employed a multiple imputation procedure to estimate students’ missing baseline 
values. Simulations suggest that, for randomized controlled trials, this approach results in limited 
bias (defined as less than 0.05 standard deviations) if missing data are not related to an 
unobserved variable that is correlated with the outcome (Puma et al. 2009).° 


We imputed missing baseline test scores separately by treatment or control status by using a 
chained linear equations model that included all outcome variables and all student characteristic 
variables in the final impact regressions. We excluded students from the imputation model if 
they had missing data for both outcome test scores. This restriction excluded 25 students, or 8.6 
percent of the randomized sample, with no outcome scores (15 in the treatment group and 10 in 
the control group). We then conducted the impact regression analyses described above in M = 
10 imputed data sets. 


7 Teachers were observed once and rated from 0 to 3 on each LLI lesson component. Those component ratings were 
then averaged for each teacher. The average of the overall ratings was 1.4. High fidelity was defined as an average 
fidelity rating of 1.5 or higher. 


s Although rates of missing baseline data were similar between the treatment and control groups, we added to the 
impact regressions indicators for each baseline assessment that was imputed. 


10 
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After collecting coefficient and standard error estimates from each of the 10 imputed data 
sets, we computed multiple imputation coefficients and standard errors using Rubin’s 
combination method (Rubin 1987), where the multiple imputation beta (6m) coefficient is the 
average of the beta coefficient values in each imputed data set (6m) and the multiple imputation 
standard error SEy is the square root of the within-imputation coefficient variance plus the 
between-imputation coefficient variance inflated by a finite imputation correction multiplier: 


si= |) ae) 


To test the sensitivity of the results to these imputation methods, we estimated a version of 
equation | with unimputed baseline data but only the fall 2016 SRI as a control for prior 
achievement, which no students in the analytic samples were missing. 


Results 


In summary, schools faced various implementation challenges, leading students to receive 
different levels of LLI duration, intensity, and instructional fidelity. LLI had no impact on 
students’ reading comprehension and a negative impact on their mastery of ELA/literacy 
standards. 


Implementation findings 


Students experienced different levels of LLI duration and intensity, and most fell short of the 
recommended minimum number of sessions. The average student assigned to LLI participated in 
the program for 16 weeks, although the duration varied significantly across students (Figure 1). 
School-level factors, such as the program start and end date, and individual student factors, such 
as the decision to stop participating in LLI, affected how long students remained in the program. 
In addition, very few students received the recommended intensity of four or five days per week 
(Figure 1). Student attendance was inconsistent—on average, students assigned to LLI attended 
2.1 out of 3.8 sessions offered per week—and only 2 of the 10 study schools offered daily 
sessions. As a result, 63 percent of students assigned to LLI received fewer than the total 
recommended minimum number of sessions. This includes 17 percent of students assigned to 
treatment who did not receive any LLI, the majority of whom were concentrated in one school. 


High schools particularly struggled with student participation and engagement. Thirty-six 
percent of high school students assigned to LLI did not receive LLI compared to 10 percent of 
students in grades 6 through 8. In most cases, the disparity reflected high school students’ (or 
their families’) refusal to participate in the program. High schoolers also attended significantly 
fewer sessions than students in middle grades (17 compared to 49, on average). Teachers at each 
of the three high schools in the study reported challenges with student attendance or engagement, 
particularly with groups pulled out of other classes. For example, one teacher reported that 
students resented leaving a creative writing class they liked for LLI. Another teacher had to 
terminate the program in early February because of chronically low student attendance despite 
initial plans to offer LLI through the end of the school year. 


11 
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Figure 1. Program duration and intensity for students assigned to LLI 
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Source: Authors’ calculations using data provided by OUSD. 


Program start and end dates varied widely across schools. Although most schools planned to 
start offering LLI in the fall, actual start dates ranged from October to February. Schools 
reported that securing the appropriate LLI materials, receiving teacher training, and completing 
the initial one-on-one student assessments sometimes took longer than anticipated. Schools’ end 
dates also varied from February to June. In seven schools, the duration of LLI was fixed (for 
example, all of second semester), while in the other three schools in the study, LLI was offered 
through the end of the school year and the duration depended on individual student progress, 
though exit criteria varied or were not always specified. On average, schools began the program 
in early December and ended in mid-April. 


Scheduling LLI classes proved difficult for some schools, and pulling students out of other 
classes presented challenges. Of the 10 study schools, 5 offered LLI in a scheduled class, 4 
pulled students out of other classes, and one used both approaches. Some schools noted that it 
was difficult to create a regularly scheduled LLI class because LLI serves a small number of 
students and their participation is not determined until after the start of the school year. Although 
offering LLI in pullout groups was logistically easier, students had to miss other classes and 
make up the work, which concerned some students, parents, and classroom teachers. Some LLI 
teachers tried to minimize this burden by holding sessions at a different time each day. Students 
in pullout groups were more likely to refuse LLI and attend fewer sessions than those in 
regularly scheduled classes, particularly at the high school level. On average, students in pullout 
groups attended 28 sessions, while students in regularly scheduled classes attended 47 sessions. 


Instructional fidelity was relatively low, primarily because of skipped or modified lesson 
components. The average teacher received a rating of 1.4 on a scale of 0 to 3 (a rating of 1 
indicates low fidelity); ratings ranged between 0.6 and 2.5 across the 20 teachers in the study. 
The observed lessons varied in length from 30 to 55 minutes and on average lasted 43 minutes. A 
typical LLI lesson is fast-paced and involves several components, with most components being 5 
to 10 minutes long. Many teachers struggled to complete all of the components in a day’s lesson 
within the allotted time and either skipped components or completed some components in the 
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following session. Across the 20 observations, 11 teachers were observed continuing a lesson or 
finishing a book from the previous day and 9 were observed skipping components altogether. For 
the components that were taught, the overall fidelity rating was 2.3 (a rating of 2 indicates 
medium fidelity). 


The most commonly skipped components were phonics/word study and reading and 
assessment (Figure 2). Although teachers sometimes skipped phonics for timing reasons, some 
felt that the content was not appropriate for the needs of their older students. The assessment 
component presented different challenges. In addition to time constraints, some teachers reported 
that they struggled to keep the rest of the group on task during one-on-one assessments. 
Conducting these assessments also required experience or training, familiarity with LLI’s online 
resources, and time to download and print the forms in advance. These requirements presented 
barriers to teachers new to LLI and even to experienced teachers with limited planning time. 


Figure 2. LLI lesson fidelity 
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Source: Authors’ calculations using data provided by OUSD. 


Finally, it was common for teachers to modify the lesson guides in other ways, such as by 
reorganizing the components or deviating from the suggested language. For example, two 
teachers combined the reading and assessment component with the writing about reading 
component because they felt that the writing activity was a better way to keep other students 
occupied while they conducted one-on-one assessments. 


Although most teachers were new to LLI, not all of them received sufficient training in the 
program. Of the 20 teachers in the study, 14 received training, typically in a two-day session led 
by a Heinemann representative, a half-day training led by OUSD central staff, or both. Only one 
of the 6 teachers who did not receive training had experience with LLI. Overall, 20 percent of 
teachers had experience with the program. The fidelity data suggest that these training 
opportunities might not have sufficiently prepared all teachers to conduct LLI as designed. 
During classroom observations, some teachers were found to need reminders about various 
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aspects of implementation, including lesson timing, suggested modifications for shorter sessions, 
how to find and use the online resources to prepare lessons in advance, how to assess students 
regularly, and how to conduct the lesson components as written. 


Impact findings 


The results of the regression analysis indicate that LLI had no discernible impact on 
students’ reading comprehension and a statistically significant, negative impact on their mastery 
of ELA/literacy standards (Table 3). The estimated effect size of being assigned to LLI on SRI 
scores is -0.02 standard deviations (p-value = 0.67), and the estimated effect size on SBAC 
ELA/literacy scores is -0.15 standard deviations (p-value = 0.01). These results are not sensitive 
to using unimputed baseline data and fewer covariates (Appendix Table A.4).? 


Table 3. ITT and TOT impact estimates in effect sizes 


SRI SBAC ELA/literacy 
Impact of assignment to LLI -0.018 -0.154* 
(0.044) (0.062) 
Impact of receipt of any LLI -0.021 -0.163** 
(0.045) (0.059) 
Number of students analyzed 263 192 


Source: Authors’ calculations using data provided by OUSD. 


Notes: This table displays impact estimates in z-scores (standard deviations). Standard errors are displayed in 
parentheses below each impact estimate. 


* Impact is statistically significant at the 5 percent level. 
** Impact is statistically significant at the 1 percent level. 


Based on the typical annual growth in reading of students in grades 6 through 8, the negative 
impact of being assigned to LLI on ELA/literacy scores is roughly equivalent to losing 5.5 
months of learning. To convert impacts into months of learning, we divided the impact estimate 
by the average of the typical annual growth in reading for students in grades 6 through 8 and 
assumed a nine-month school year. The accuracy of this conversion depends on the extent to 
which learning growth on the SBAC ELA/literacy test is similar to the exams analyzed in Bloom 
et al. (2008). Another way to interpret these results is in terms of regression-adjusted percentile 
scores (Figure 3). In spring 2016, treatment and control students had an average percentile score 
of 30 on the SBAC ELA/literacy test. One year later, students in the control group improved 
their scores by 11 percentile points, on average, while the performance of students assigned to 
LLI decreased by 1 percentile point. 


° Mean outcome test scores by treatment status are also reported in Table A.3 in the Appendix. 
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Figure 3. ITT impacts in percentile scores 
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Source: Authors’ calculations using data provided by OUSD. 


Notes: Percentile scores range from 1 to 100 and are relative to all secondary students in OUSD. The control 
group averages were regression-adjusted using the estimated ITT effect sizes in Table 3. 


*Impact is statistically significant at the 5 percent level. 


As discussed in the implementation findings, several students in the treatment group did not 
participate in LLI as assigned, which could affect the program’s effectiveness. Of the 145 
students in the treatment group, 83 percent received at least four sessions. There was limited 
crossover from the control group, with only one student assigned to control receiving LLI. TOT 
analyses estimating the impact of receiving LLI showed that LLI had no discernible impact on 
SRI scores and a statistically significant, negative impact on SBAC ELA/literacy scores (Table 
3). Specifically, receiving at least four LLI sessions had an impact of -0.16 standard deviations 
on SBAC ELA/literacy scores (p-value = 0.01). 


When we test for differences in impacts along student characteristics (English learner status 
and baseline reading level) and facets of implementation (teacher experience, instructional 
fidelity, and instructional setting), we find statistically significant differences only between 
students who received LLI in regularly scheduled classes and those who received LLI in a 
pullout group (Table 4). The estimated differences between these two groups are -0.21 standard 
deviations for the SRI (p-value = 0.01) and -0.27 standard deviations for the SBAC ELA/literacy 
test (p-value = 0.05), with students in pullout groups performing worse than those in regularly 
scheduled classes. The estimated impacts for students in regularly scheduled classes were 
positive for the SRI (0.06 standard deviations) and negative for SBAC ELA/literacy (-0.10 
standard deviations) but were not statistically significant. The other factors we examined were 
not associated with different impacts, although small sample sizes limited our ability to obtain 
precise estimates. 
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Table 4. Differences in impacts by subgroup 


SBAC 
SRI ELA/literacy 
Impact for non-English learners (reference group) conn es 
Difference between impacts for English learners hehe een 
Impact for students fewer than three years behind grade level at baseline 0.096 -0.176 
(reference group) (0.079) (0.117) 
Difference between impacts for students three or more years behind grade level -0.161 0.036 
at baseline (0.101) (0.146) 
Impact for students taught by an experienced LLI teacher (reference group) Gon co 
: : : -0.211 0.142 
Difference between impacts for students taught by a first-year LLI teacher (0.112) (0.126) 
Impact for students taught LLI with lower fidelity (reference group) ic ne o as 
Difference between impacts for students taught LLI with higher fidelity oh ee 
Impact for students who did not receive LLI in a pullout group (reference group) woe oan 
Difference between impacts for students who received LLI in a pullout group nn nee 
Number of students analyzed 263 192 


Source: Authors’ calculations using data provided by OUSD. 


Notes: This table displays impact estimates in z-scores (standard deviations). Standard errors are displayed in 
parentheses below each impact estimate. 


* Impact is statistically significant at the 5 percent level. 
** Impact is statistically significant at the 1 percent level. 


Discussion 


Null and mixed findings are not uncommon in evaluations of adolescent literacy 
interventions. As of this writing, the What Works Clearinghouse has identified research meeting 
standards for 21 adolescent literacy interventions. Seven interventions were determined to have 
no effect on reading comprehension or general literacy achievement, 13 had mixed or potentially 
positive effects, and only one had positive effects (READ 180). Finding effective reading 
interventions for secondary students who are multiple years behind grade level may be especially 
difficult. An experimental study of four reading interventions found that two programs (READ 
180 and RISE) were effective with moderate-risk grade 9 students, but none had an impact on 
the achievement of grade 9 students reading below a grade 4 level (Lang et al. 2009). 


The finding of a negative effect on SBAC ELA/literacy performance in this study is more 
unusual. A possible explanation for this result is that, by participating in LLI, some students 
missed grade-level content covered in the SBAC ELA/literacy assessment. This theory is 
supported by the fact that the negative effects were greater for students who received LLI in a 
pullout group. It does not appear that the results can be explained by students in the control 
group receiving other, more effective literacy supports that students assigned to LLI did not 
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receive. Teachers recorded additional literacy supports outside of regular classroom instruction 
for only 18 percent of students in the control group.!° 


Implementation challenges may have affected the results. Despite the prevalence of null and 
mixed findings in the adolescent literacy literature, one reason OUSD elected to implement LLI 
in secondary schools is that—consistent with the experience of many of its elementary schools, 
which have implemented the program for a number of years—research evidence suggests that 
the program is effective with students in early grades. However, aside from focusing on a 
markedly different grade span, these earlier studies featured other important distinctions related 
to implementation: students received a greater number of sessions through daily lessons, 
instructional fidelity ratings were high, and all teachers received eight days of professional 
development, along with continuing support throughout the period of implementation. In 
contrast, fidelity to LLI’s model was uneven and varied across the schools in the study."! 


Previous research suggests that experience can be an important factor in implementation. 
Most LLI teachers in this study were new to the program, yet research has found that it takes 
teachers one year to feel confident with a new instructional approach (Fullan 2001; Hall and 
Hord 2001). Lack of experience may also delay improvements in student outcomes. For 
example, a study of four reading interventions for students in grade 5 found that the only positive 
impact was observed in the second year of the study, after teachers had had one year of 
experience in using the program (James-Burdumy et al. 2012). On the other hand, the other three 
intervention programs in that study were still ineffective in the second year, highlighting the 
prevalence of null findings in this literature. 


There is also reason to think that additional professional development might improve 
implementation, potentially leading to better results. A systematic review of 33 studies of 
adolescent literacy programs determined that the approaches found to be effective provided 
extensive professional development and significantly changed teaching practices; further, 
programs designed to change daily teaching practices had greater research support than those 
focused on curriculum alone (Slavin et al. 2008). Like other intensive programs, LLI is fast- 
paced and requires nuanced judgment calls that may require more extensive teacher training. In 
addition, secondary literacy teachers are not typically trained in the foundational reading 
strategies that are part of LLI and might thus require more professional development than 
teachers in early grades. 


There may be other factors that affect the implementation of reading interventions at the 
secondary level. For example, despite the greater downsides to pulling secondary students out of 
grade-level instruction to receive LLI, scheduling a regular LLI class can present logistical 
challenges for middle and high schools. In addition, because of students’ low starting reading 
levels, most schools used LLI materials designed for students in elementary grades—which 


'0 Th the three most common examples, 10 control students participated in a reading pullout program using 
nonfiction texts, 6 control students were enrolled in an English enrichment class, and 6 control students worked with 
Newsela, a nonfiction personalized learning technology. It is unclear whether some students in the treatment group 
also received additional supports. 


"| Other studies of literacy interventions have similarly found variable implementation across schools, even despite 
relatively rigid implementation guidelines (e.g., Levin et al. 2010 and King 1994). 
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might help explain why teachers skipped some lesson components (particularly phonics) and 
why high school students were less engaged. Finally, according to LLI’s theory of change, 
continuity of instruction from the regular classroom to intervention may be important to student 
progress, yet is more common in elementary schools. For students in this study, LLI instruction 
tended to be dramatically different from any other literacy instruction they received. 


More broadly, the study’s findings highlight the importance of assessing whether the success 
of intervention programs in one context can be replicated with different populations and under 
different conditions. Although LLI was effective with early grades in multiple studies, it did not 
lead to positive results for secondary students in Oakland. In addition, given that implementation 
varied across teachers and schools, future studies of adolescent reading interventions should 
consider assessing treatment effect heterogeneity in their designs. Finally, the results of this 
study illustrate the challenges that schools and teachers face in implementing effective reading 
intervention programs at the secondary level. 
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Table. A.1. Baseline student characteristics, SRI analytic sample 


Standardized 
Baseline characteristic Treatment Control difference 


Mean N (of 128) Mean N (of 135) Percent 


17.3 17.9 


Fall 2016 F&P BAS (numeric score, 1-26) (2.76) 119 (2.56) 126 24.8 
Fall 2016 SRI (z-score) oe 128 cies 135 21.4 
Fall 2016 SMI (z-score) ae 100 ea 115 25.6 
Spring 2016 SBAC ELAl/literacy (z-score) hers 113 eon) 117 21.6 
Spring 2016 SBAC math (z-score) ea 112 iene 119 13.8 
African American (%) 35.9 128 34.1 135 3.9 
Asian/Filipino/Pacific Islander (%) 6.25 128 9.63 135 12.5 
Latino (%) 52.3 128 51.9 135 1.0 
Male (%) 52.3 128 52.6 135 0.4 
Eligible for free/reduced-price lunch (%) 95.3 128 92.7 135 11.0 
English learner (%) 37.5 128 28.5 135 19.2 
Special education (%) 13.3 128 10.2 135 9.5 


Source: Authors’ calculations using data provided by OUSD. 


Notes: Standard deviations are displayed in parentheses. F&P letter scores were converted to numeric scores 
(i.e., 17 corresponds to the letter Q). SRI, SMI, and SBAC scores were converted to z-scores defined 
relative to OUSD’s distribution of scores by test, grade, and administration. 


None of the differences is statistically significant. 
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Table. A.2. Baseline student characteristics, SBAC analytic sample 


Standardized 
Baseline characteristic Treatment Control difference 
N (of 95) Mean N (of 97) mereent 
Fall 2016 F&P BAS (numeric score, 1—26) 16.5 86 17.3 87 28.9 
(2.53) (2.44) 
Fall 2016 SRI (z-score) -0.479 95 -0.398 97 16.1 
(0.513) (0.495) 
Fall 2016 SMI (z-score) -0.536 76 -0.391 82 17.3 
(0.828) (0.845) 
Spring 2016 SBAC ELA/literacy (z-score) -0.665 86 -0.532 87 21.1 
(0.626) (0.632) 
Spring 2016 SBAC math (z-score) -0.616 86 -0.559 89 8.2 
(0.607) (0.772) 
African American (%) 0.326 95 0.299 97 5.9 
Asian/Filipino/Pacific Islander (%) 0.063 95 0.072 97 3.6 
Latino (%) 0.568 95 0.577 97 1.8 
Male (%) 0.516 95 0.480 97 7.2 
Eligible for free/reduced-price lunch (%) 0.968 95 0.949 97 9.7 
English learner (%) 0.400 95 0.327 97 15.2 
Special education (%) 0.137 95 0.102 97 10.7 


Source: Authors’ calculations using data provided by OUSD. 


Notes: Standard deviations are displayed in parentheses. F&P letter scores were converted to numeric scores 
(i.e., 17 corresponds to the letter Q). SRI, SMI, and SBAC scores were converted to z-scores defined 
relative to OUSD’s distribution of scores by test, grade, and administration. 


None of the differences is statistically significant. 
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Table. A.3. Outcome test scores by treatment status 


Spring 2017 SRI Spring 2017 SBAC ELA/literacy 
Standard Standard 
deviation deviation 
Control group -0.485 0.538 128 -0.455 0.680 97 
Assigned to LLI -0.586 0.542 135 -0.700 0.612 95 
Difference -0.101 -0.245** 


Source: Authors’ calculations using data provided by OUSD. 


Notes: This table displays test scores in z-scores. Each z-score represents the number of standard deviations 
above or below the districtwide mean score in that test, grade, and administration. 


“Difference is statistically significant at the 1 percent level. 


Table. A.4. ITT impact estimates in effect sizes, sensitivity analysis 


SRI SBAC ELA/literacy 
-0.038 -0.182** 
Simplified model with unimputed baseline data (0.044) (0.068) 
Number of students analyzed 263 192 


Source: Authors’ calculations using data provided by OUSD. 


Notes: This table displays impact estimates in z-scores (standard deviations). Standard errors are displayed in 
parentheses below each impact estimate. 


**Impact is statistically significant at the 1 percent level. 
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